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In the Claims: 

The following is a list of claims pending in this application and their current 
status. This listing replaces all prior versions and listings. For printing purposes, we 
have increased the font size of sub and superscripts in the equations that appear in a 
few claims, without changing any of the texts. 

1 . (Previously presented) A computer assisted method of auditing a 
superset of training data, the superset comprising examples of documents having one 
or more preexisting category assignments, the method including: 

partitioning the superset into at least two disjoint sets, including a test set and a 
training set, wherein the test set includes one or more test documents and the 
training set includes examples of documents belonging to at least two categories; 

automatically categorizing the test documents using the training set; 

calculating a metric of confidence based on results of the categorizing step and 
comparing the automatic category assignments for the test documents to the 
preexisting category assignments; and 

reporting the test documents and preexisting category assignments that are 
suspicious and the automatic category assignments that appear to be missing from 
the test documents, based on the metric of confidence. 

2. (Original) The method of claim 1, further including repeating the 
partitioning, categorizing and calculating steps until at least one-half of the 
documents in the superset have been assigned to the test set. 

3. (Original) The method of claim 2, wherein the test set created in the 
partition step has a single test document. 

4. (Original) The method of claim 2, wherein the test set created in the 
partition step has a plurality of test documents. 
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5. (Currently amended) The method of claim 1 , further including repeating 
the partitioning, categorizing and calculating steps until oubotantia l ly all more than 50 
percent of the documents in the superset have been assigned to the test set. 

6. (Currently amended) The method of claim 1 , wherein the partitioning, 
categorizing and calculating steps are earned out automatically oubotantially without 
uoor i ntorvontion . 

7. (Currently amended) The method of claim 5, wherein the partitioning, 
categorizing and calculating steps are carried out automatically cubctantia l ly without 
uoor intorvontion . 

8. (Currently amended) The method of claim 1 , wherein the partitioning, 
categorizing, calculating and reporting steps are carried out automatically cubotantiallv 
without uoor i ntorvontion . 

9. (Currently amended) The method of claim 5, wherein the partitioning, 
categorizing, calculating and reporting steps are carried out automatically oubotant i al l y 
w i thout uoor intorvontion . 

1 0. (Original) The method of claim 1 , wherein the categorizing step includes 
determining k nearest neighbors of the test documents and the calculating step is based 
on a k nearest neighbors categorization logic. 

1 1 . (Previously presented) The method of claim 10, wherein the metric of 
confidence is an unweighted measure of distance between a particular test document 
and the examples of documents belonging to various categories. 

1 2. (Previously presented) The method of claim 1 1 , where the unweighted 

measure includes application of a relationship £2 0 (d t ,r,„) = ^ ^(d t ,d) , 

de{K(d t )nT m ) 

wherein 

Slo is a function of the particular test document represented by the a feature 
vector d t and of various categories T m \ and 

s is a metric of distance between the particular test document feature vector d t 
and certain sample documents represented by feature vectors d, the certain 
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sample documents being among a set of k nearest neighbors of the particular 
test document having category assignments to the various categories T m . 

13. (Previously presented) The method of claim 10, wherein the metric of 
confidence is a weighted measure of distance between a particular test document and 
the examples of documents belonging to various categories, the weighted measure 
taking into account the density of a neighborhood of the test document. 

14. (Previously presented) The method of claim 13, wherein the weighted 
measure includes application of a relationship 

£ *(dt.*i) 

n l (d t J m ) = die{K( £ )nT ^ .wherein 

U 1 m) X *(d t ,d 2 ) 

is a function of the test document represented by the a feature vector d t and 
of various categories T m \ and 

s is a metric of distance between the test document feature vector d t and certain 
sample documents represented by feature vectors di and d 2> the certain sample 
documents di being among a set of k nearest neighbors of the test document 
having category assignments to the various categories T m and the certain sample 
documents d 2 being among a set of k nearest neighbors of the test document. 

15. (Previously presented) The method of claim 1 , wherein the reporting step 
further includes filtering the reported test documents based on the metric of confidence. 

16. (Previously presented) The method of claim 15, wherein filtering further 
includes color coding the reported test documents based on the metric of confidence. 

17. (Previously presented) The method of claim 15, wherein filtering further 
includes selecting for display the reported test documents based on the metric of 
confidence. 

18. (Previously presented) The method of claim 1, wherein reporting includes 
generating a printed report. 
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19. (Previously presented) The method of claim 1 , wherein reporting includes 
generating a file conforming to XML syntax. 

20. (Previously presented) The method of claim 1 , wherein reporting includes 
generating a sorted display identifying at least a portion of the test documents. 

21 . (Previously presented) The method of claim 1 , further including 
calculating a precision score for the reported test documents. 

22. (Previously presented) A computer assisted method of auditing a 
superset of training data, the superset comprising examples of documents having one 
or more preexisting category assignments, the method including: 

determining k nearest neighbors of the documents in a test subset automatically 
partitioned from the superset; 

automaticaliy categorizing the documents based on the k nearest neighbors into 
a plurality of categories; 

calculating a metric of confidence based on results of the categorizing step and 
comparing the automatic category assignments for the documents to the 
preexisting category assignments; and 

reporting the documents in the test subset and preexisting category assignments 
that are suspicious and the automatic category assignments that appear to be 
missing from the documents in the test subset, based on the metric of 
confidence. 

23. (Previously presented) The method of claim 22, wherein the metric of 
confidence is an unweighted measure of distance between a particular test document 
and the examples of documents belonging to various categories. . 

24. (Original) The method of claim 23, where the unweighted measure 

includes application of a relationship Q 0 (d tJ T m ) = ]T s(d t ,d), wherein 

6e{K{d t )nT m } 

&o is a function of the test document represented by the a feature vector d t and 
of various categories T m \ and 
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s is a metric of distance between the test document feature vector d t and certain 
sample documents represented by feature vectors d, the certain sample 
documents being among a set of k nearest neighbors of the test document 
having category assignments to the various categories T m . 

25. (Previously presented) The method of claim 22. wherein the metric of 
confidence is a weighted measure of distance between a particular test document and 
the examples of documents belonging to various categories, the weighted measure 
taking into account the density of a neighborhood of the test document. 

26. (Original) The method of claim 25, wherein the weighted measure 

includes application of a relationship Q t (d t , T m ) = dl€{ K Q )n7m } , wherein 

d 2 e/C<d t ) 

£1, is a function of the test document represented by the a feature vector d, and 
of various categories T m ; and 

s is a metric of distance between the test document feature vector dt and certain 
sample documents represented by feature vectors and d 2 , the certain sample 
documents di being among a set of k nearest neighbors of the test document 
having category assignments to the various categories T m and the certain sample 
documents d 2 being among a set of k nearest neighbors of the test document. 

27. (Currently amended) The method of claim 22, wherein the determining, 
categorizing and calculating steps are carried out automatically oubotantially without 
uoor intorvont i on . 

28. (Previously presented) The method of claim 22, wherein the reporting 
step further includes filtering the documents based on the metric of confidence. 

29. (Previously presented) The method of claim 28, wherein the filtering step 
further includes color coding the reported documents based on the metric of confidence. 
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30. (Previously presented) The method of claim 28, wherein the filtering step 
further includes selecting for display the reported documents based on the metric of 
confidence. 

31 . (Previously presented) The method of claim 22, wherein reporting 
includes generating a printed report. 

32. (Previously presented) The method of claim 22 f wherein reporting 
includes generating a file conforming to XML syntax. 

33. (Previously presented) The method of claim 22, wherein reporting 
includes generating a sorted display identifying at least a portion of the documents. 

34. (Previously presented) The method of claim 22, further including 
calculating a precision score for the reported documents. 
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